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Abstract 

,» 

We  model  the  process  of  communicating  in  the  presence  of  interference, 
which  is  unknown  or  hostile,  as  a  two-person  zero  sum  game  with  the  commu¬ 
nicator  and  the  jammer  as  the  players.  The  objective  function  we  consider  is 
the  mutual  information.  The  communicator’s  strategies  are  distribution^ on 
the  input  alphabet  and  on  a  set  of  quantizers.  The  jammer’s  strategies  are 
distributions  on  the  noise  power  subject  to  certain  constraints.AVe  consider 
various  conditions  on  the  jammer’s  strategy  set  and  on  the  communicator’s 
knowledge.  For  the  case  with  the  decoder  uninformed  of  the  actual  quantizer 
chosen  we  show  that,  from  the  communicator’s  perspective,  the  worst-case 
jamming  strategy  is  a  distribution  concentrated  at  a  finite  number  of  points 
thereby  converting  a  functional  optimization  problem  into  a  non-linear  pro¬ 
gramming  problem.  Moreover,  we  are  able  to  characterize  the  worst-case 
distributions  by  means  of  necessary  and  sufficient  conditions  which  are  easy 
to  verify.  For  the  case  with  the  decoder  informed  of  the  actual  quantizer 
chosen  we  are  able  to  demonstrate  the  existence  of  saddle-point  strategies. 
The  analysis  is  also  seen  to  be  valid  for  a  number  of  situations  where  the 
jammer  is  adaptive.  /• 
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1  Introduction 

The  applicability  of  game-theoretic  models  in  jamming  situations  is  by  now 
well  established  [Blac  57],  [Root  61],  [McEl  83a],  [McEl  83b],  [Star  82],  [Chan 
85],  [Peng  86].  In  this  paper  we  formulate  fairly  general  models  for  a  number  of 
jamming  situations  as  two- person  zero-sum  games  between  the  communicator  and 
the  jammer.  We  allow  the  jammer  the  choice  of  one  of  a  set  of  noise  distributions 
satisfying  peak  and  average  power  constraints.  By  way  of  counter-measure  the 
communicator  is  allowed  to  randomize  the  input  symbols  as  well  as  randomize 
the  quantizer  at  the  output  side.  We  intend  the  analysis  to  be  applicable  to  the 
performance  of  soft  decision  decoding  for  jammed  channels. 

Typically  in  a  spread  spectrum  channel  the  performance  in  additive  white  Gaus¬ 
sian  noise  is  identical  to  the  performance  of  non-spread  systems;  namely  the  bit 
error  probability  decreases  exponentially  with  signal-to-noise  ratio.  However,  when 
subject  to  worst-case  partial-band  or  pulsed  jamming  (wherein  power  is  concen¬ 
trated  in  time  or  frequency  to  affect  only  a  fraction  of  the  symbols  transmitted 
while  allowing  the  remaining  to  be  received  “error- free”)  the  bit  error  probability 
of  a  spread-spectrum  system  decreases  only  inverse  linearly  with  the  signal-to-noise 
ratio.  This  is  a  significant  degradation,  typically  of  the  order  of  30-40  dB  for  a  bit 
error  probability  on  the  order  of  10-5. 

To  remedy  this  situation  most  systems  use  some  form  of  error-correction  coding. 
For  example,  it  can  be  shown  that  with  a  hard  decision  decoder  if  the  code  rate 
is  small  (<  1/2)  and  the  jammer  is  allowed  to  pulse  between  several  Gaussian 
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distributions  then  there  is  no  loss  in  signal- to- noise  ratio  necessary  for  reliable 
communications  compared  to  an  additive  Gaussian  noise  channel  with  the  same 
(average)  power.  So  it  can  be  said  that  coding  (with  hard  decision  demodulation) 
neutralizes  a  (power  constrained)  jammer  (i.e.,  makes  the  performance  the  same 
as  an  additive  white  Gaussian  noise)[Stark  85a], [Ma  84].  It  can  also  be  shown  that 
the  worst  case  jamming  strategy  is  to  pulse  between  two  zero  mean  Gaussian  noise 
distributions,  one  of  which  has  zero  variance. 

As  has  been  well  known  in  the  communication  field,  hard  decision  decoding 
loses  roughly  2  dB  in  signal-to-noise  ratio  compared  to  soft  decision  decoding. 
Thus  considerable  interest  has  focused  on  soft-decision  decoding.  One  problem 
that  has  been  observed  is  that  if  a  (soft)  decoding  algorithm  designed  for  a  non- 
jammed  channel  is  used  for  a  jammed  channel  then  the  performance  is  extremely 
poor  when  the  jamming  strategy  is  optimized.  One  method  for  “overcoming”  this 
difficulty  is  to  assume  the  jamming  noise  is  one  of  two  distributions  (usually  one 
having  zero  variance  called  the  “off”  state  and  the  other  called  the  “on”  state) 
and  that  the  decoder  knows  when  the  jammer  is  “on”  and  when  the  jammer  is 
“off”.  Using  this  side  information,  similar  results  to  the  hard  decision  case  have 
been  obtained  for  the  soft  decision  case  [Simo  85]  (for  small  rates  there  is  no  loss 
in  performance).  However  assuming  this  information  is  available  is  assuming  away 
the  problem.  Most  systems  analyses  do  not  incorporate  jamming  strategies  that 
affect  the  reliability  of  the  side  information. 

Thus  there  has  been  considerable  interest  over  the  last  few  years  on  decoding 
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algorithms  that  do  not  assume  side  information  and  do  not  do  hard  decision  de¬ 
coding.  However,  most  of  these  algorithms  still  assume  the  jammer  pulses  between 
one  of  two  levels.  In  this  paper  we  investigate  the  case  of  a  decoder  that  processes 
symbols  from  a  finite  alphabet  and  where  the  only  constraints  on  the  jammer  are 
average  and  peak  power.  We  formulate  the  problem  as  a  game  with  two  play¬ 
ers.  The  jammer  whose  strategy  set  consists  of  distributions  on  the  power  of  the 
jamming  noise,  and  the  communicator,  whose  strategy  set  consists  of  a  pair  of  dis¬ 
tributions,  one  on  the  input  alphabet  and  one  on  the  set  of  quantizers.  We  look  for 
worst-case  jamming  strategies  and  investigate  when  the  game  admits  of  a  saddle 
point.  We  do  the  analysis  using  mutual  information  as  our  objective  function. 

We  consider  a  modulator  that  transmits  one  out  of  M  signals.  This  trans¬ 
mitted  signal  is  denoted  by  the  random  variable  X.  The  received  signal  which 
has  been  corrupted  by  the  jammer  in  some  fashion  is  demodulated  and  quantized 
into  one  of  L  values.  In  order  to  disallow  the  jammer  from  using  knowledge  of 
the  quantizer  in  designing  his  worst-case  strategy,  we  allow  randomization  of  the 
quantizer  over  some  given  set  of  quantizers.  Clearly  such  randomization  increases 
the  the  size  of  the  communicator’s  strategy  set.  Thus,  we  view  this  situation  as 
a  game  with  two  players;  the  jammer  and  the  communicator.  The  jammer  selects 
the  noise  in  the  channel  and  the  communicator  chooses  the  encoder,  the  decoder 
and  the  quantizer.  The  strategy  set  for  the  jammer  is  the  set  of  all  distributions  on 
the  power  of  the  jamming  noise  subject  to  the  given  constraints  on  the  peak  and 
average  power.  The  strategy  set  for  the  communicator  is  the  set  of  all  distributions 


/, 


% 


v 


.VI  w.;  n  w  w  -  w 


on  the  input  alphabet  and  on  the  set  of  quantizers. 

For  this  general  set  up  we  show  that  the  worst  case  jamming  strategy  from  the 
communicator’s  perspective  is  to  pulse  between  a  finite  number  of  power  levels. 
We  also  consider  the  case  of  random  decoding  strategies  where  the  demodulator 
output  is  quantized  into  a  finite  number  of  outputs  by  a  randomized  quantizer, 
i.e.,  the  quantization  thresholds  are  random. 

For  this  case  we  show  that  the  optimal  randomized  quantizer  can  perform  bet¬ 
ter  than  the  nonrandomized  quantizer  and  that  from  the  jammer’s  point  of  view 
the  worst-case  distribution  of  the  thresholds  is  concentrated  on  a  finite  number  of 
points.  Our  basic  model  can  be  easily  seen  to  fit  a  frequency-hop  communication 
system  in  which  the  modulation  utilizes  an  A/-ary  signal  set,  which  in  many  cases 
are  orthogonal  signals.  The  spread-spectrum  bandwith  is  divided  into  a  large  num¬ 
ber  of  frequency  slots.  Each  possible  modulated  signal  is  hopped  from  frequency 
slot  to  frequency  slot  using  a  pseudo-random  hopping  pattern.  During  each  hop 
one  of  the  M  signals  is  transmitted.  There  are  two  important  special  cases.  First, 
all  modulated  signals  use  the  same  hopping  pattern  and  second,  each  signal  has 
its  own  hopping  pattern.  The  demodulator  is  a  coherent  or  noncoherent  matched 
filter  which  is  then  quantized  to  a  finite  number  of  values. 

The  remainder  of  the  paper  is  organized  as  follows.  In  Section  2  we  define  the 
models  we  will  be  considering  and  give  examples  for  which  our  models  apply.  In 
Sections  3  and  4  we  derive  our  results  concerning  the  worst  case  jamming  strategy 
and  the  optimal  quantizer  strategy  for  the  cases  with  decoder  uninformed  about 


the  actual  quantizer  chosen  and  with  decoder  informed  about  the  actual  quantizer 
chosen  respectively.  Finally  in  Section  5  we  discuss  our  results  and  state  our 
conclusions  and  extensions. 

2  Channel  Models 

In  this  section  we  describe  the  models  we  use  in  the  subsequent  analysis.  In 
Jill  cases  we  consider  a  modulator  that  transmits  one  out  of  M  signals  in  D 
dimensions  ( D  <  M).  This  transmitted  signal  is  denoted  by  the  random  variable 
X.  The  received  signal  which  is  corrupted  by  the  jammer  in  some  fashion  is 
demodulated  and  quantized  into  one  of  L  values.  The  received  signal  is  denoted 
by  the  random  variable  Y.  (Y  can  also  be  a  random  vector  without  changing  any 
of  the  following  analysis). 

The  general  philosophy  that  we  will  use  is  that  of  game  theory  with  the  players 
being  the  jammer  and  the  communicator.  The  jamming  strategies  are  distributions 
dF  on  D  random  variables,  Zj,  Z?, ...,  Zq  .  These  random  variables  represent 
the  power  of  the  jammer  in  each  of  the  signal  dimensions  and  are  modelled  as 
modulating  a  generic  noise  variable  present  in  the  channel.  For  example,  if  D  = 
1  and  N  is  a  zero-mean,  variance  1  Gaussian  random  variable  then  the  jammer’s 
noise  may  be  of  the  form  Z\N.  The  jammer,  however,  has  an  average  power 
constraint  and  a  peak  power  constraint.  More  generally  the  jammer  is  constrained 
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f  f  (z\,*7,  ...,zD)dF{zi,z2 . zD)  <  Kj  (1) 

and 

0  <  Z,  <  b,  j  =  1 . D  (2) 

where  b:  is  the  peak  power  constraint  and  f[zx, ...,  zq)  is  some  continuous 
functional  of  (zi,...,zp).  For  average  power  constrained  channels  with  no  peak 
constraint  we  let  b:  become  very  large. 

The  output  of  the  demodulator  is  quantized  into  one  of  L  values,  say  0, 1 
1.  The  output  of  the  quantizer,  Y,  is  also  the  output  of  the  channel  for  coding. 
The  strategies  for  the  communicator  are  to  choose  a  distribution,  dG(6 ),  on  the 
quantization  thresholds  and  a  distribution,  dP(i),  on  the  input  alphabet.  We  will 
let  0  parametrize  the  quantizers  and  assume  0  is  some  compact  subset  of  R  (0 
will  be  used  to  denote  both  the  random  variable  as  well  as  a  particular  realization). 
For  each  (zj, ...,  zp)  and  8  there  is  a  probability  distribution  on  the  output  of  the 
channel  given  the  input  of  the  channel: 


Prob{Y  =  y\X  =  x,0  =  8,ZX  =  zi,Z,  =  z7,...,ZD  =  zd}  =  p(y|i, 0,zi,z2, zd)- 

(3) 

The  above  model  describes  the  input  output  relation  of  the  channel  for  a  particular 
symbol.  In  addition  we  model  the  channel  as  being  memoryless. 

We  now  introduce  some  notation.  Let: 

A  =  {0,  —  1}  be  the  input  alphabet, 
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B  =  {0, 1, L  —  1}  be  the  output  alphabet, 

0  be  the  quantizer  parameter  space  (some  compact  subset  of  R) 

Z  be  (Z\, Z p),  {0  <  Zi  <  bf) 

p(y\x,9,z),  the  transition  probability  from  x  to  y  given  9,  z,  and 
Pvx(9,z)  the  corresponding  stochastic  matrix,  Pvx(9,z)  =  \p(y\x,  9,  *)]. 
We  assume  that 

(i)  p(y\x,9,  z)  is  continuous  in  z  for  all  9,  x  and 

(ii)  p(y|x,  9,  z)  is  continuous  in  9  for  ail  x,z. 

Let  S  denote  the  set  of  all  probability  distributions  on  the  Borel  sets  of  K  =  {i  = 
(zi, . . . ,  zp)  :  0  <  Zi  <  6,},  and 

I(G,P-,F)  =  tQK  j*  P„(t>,z)dGWdF{z)) 

=  I  {jK  PS,(*W(*)) 

=  i  (l 

=  1  (F„(c,  F))  (4) 

where  I  (PAG,  F))  is  the  mutual  information  whenever  X  and  Y  axe  related 
by  the  stochastic  matrix  ~Pyx. 

The  performance  measure  we  are  interested  in  is  the  largest  rate  such  that 
nearly  error-free  communication  can  be  achieved,  i.e.  channel  capacity.  Another 
performance  of  interest  is  the  channel  cutoff  rate,  Rq,  since  many  researchers  be¬ 
lieve  this  to  be  a  practical  limit  to  the  set  of  rates  for  which  reliable  communication 


is  possible.  Similar  results  to  those  in  this  paper  can  be  derived  with  Ro  as  the 
performance  measure  (see  [Hegd  87]). 


We  consider  two  different  information  structures  for  the  communicator: 

I.  The  decoder  is  unaware  of  the  actual  quantizer  chosen  but  only  knows  the 
distribution  dG{9)  on  the  set  of  quantizers.  The  jammer  knows  only  the  set 
of  quantizers  but  not  the  distribution  dG(0)  chosen  by  the  communicator. 
He  is  also  aware  that  the  decoder  does  not  know  the  actual  quantizer  chosen. 

II.  The  decoder  knows  the  actual  quantizer  chosen.  Again  the  jammer  knows 
only  the  the  set  of  quantizers.  He  also  knows  that  the  decoder  is  aware  of 
the  actual  quantizer  chosen. 

Case  I  is  seen  to  apply  to  situations  where,  for  reasons  of  implementation  perhaps, 
the  decoding  is  fixed  and  not  altered  with  the  specific  quantizer  chosen.  It  may  also 
be  viewed  as  worst-case  in  the  sense  that  the  decoders  knowledge  of  the  specific 
quantizer  and  the  utilization  of  such  knowledge  can  only  improve  the  communi¬ 
cator’s  performance.  When  there  is  no  randomization  of  the  quantizer,  i.e.  the 
quantizer  is  fixed,  Cases  I  and  II  are  the  same  and  our  results  for  both  cases  apply 
to  that  situation.  Also  several  special  jamming  strategies  are  of  interest  because 
of  correspondence  with  physical  problems.  We  will  classify  the  cases  as  follows. 

A.  Arbitrary  joint  distribution  on  Zt,  Z2, ...,  Zp. 

B.  Z i  =  Z2  —  ...  —  Z q  —  Z. 

C.  One  dimensional  jamming,  i.e.,  at  most  one  of  the  random  variables  Z,  ^  0. 
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D.  Independent  jamming,  i.e.,  Z[,  Z2, Zp  are  independent. 

Case  B  corresponds  *o  the  physical  situation  where  the  jammer  is  not  able  to 
place  different  amounts  of  power  in  different  dimensions  of  the  signal  space.  Case 
C  corresponds  to  the  case  where  only  one  of  the  dimensions  can  be  jammed  at  once. 
Case  D  corresponds  to  a  frequency-hop  communication  system  with  independent 
hopping  for  the  different  symbols.  The  standard  game  theoretic  description  is 
given  below. 

Communicator’s  Perspective 

The  communicator  is  interested  in  the  maximum  rate  at  which  information  can  be 
reliably  transmitted  no  matter  what  strategy  the  jammer  employs.  The  communi¬ 
cator  designs  his  system  assuming  the  jammer  will  somehow  find  out  the  strategy 
he  is  using  and  then  choose  the  worst  possible  distribution  on  the  power  levels.  In 
Case  I  the  largest  rate  for  which  information  can  reliably  be  transmitted  is 

max  min  I(G,P;F) 

G,P  F 

where  J(G,  P\  F)  =  I{X\  Y)  when  ( dG ,  dP)  is  chosen  by  the  communicator  and  dF 
is  chosen  by  the  jammer.  That  this  is  the  maximum  rate  of  reliable  transmission 
is  well  known  since  what  we  are  dealing  with  is  a  compound  channel  with  a  finite 
input  alphabet  and  a  finite  output  alphabet  [Csiz  81,  pgs.  172-173). 


Jammer’s  Perspective 


The  jammer  is  interested  in  the  minimum  rate  such  that  information  can  not  be 
reliably  transmitted  at  any  higher  rate  no  matter  what  strategy  the  communi¬ 
cator  employs.  The  jammer  designs  his  system  assuming  the  communicator  will 


somehow  find  out  the  strategy  he  is  using  and  then  design  the  optimal  communi¬ 
cation  system.  In  Case  I  the  smallest  rate  that  the  jammer  can  guarantee  reliable 
communication  can  not  be  above  is 

min  max  I(G,  P;  F). 
dF  dG,dP 

That  this  is  the  smallest  rate  the  jammer  can  guarantee  is  obvious  since  for  each 

F  the  rate  above  which  reliable  communication  is  impossible  is  max  I(G,  P\  F). 

dG,dP 

In  case  II  the  appropriate  mutual  information  can  be  written  as  an  expectation  of 
the  mutual  information  for  a  fixed  9: 

I(G,P;F)  =  Ec(I(9,P-,F)) 

where  Eg  refers  to  taking  expectations  w.r.t.  dG  and  1(6,  P\  F )  =  I(X\  Y\9). 

In  all  of  our  analysis  we  assume  that  the  jammer  and  the  decoder/quantizer 
have  complete  information  about  the  set  of  strategies  possible  for  each  other  so 
that  no  secret  information  is  considered.  As  mentioned  previously,  the  performance 
measure  we  consider  is  the  largest  rate  such  that  reliable  communication  (in  the 
sense  of  arbitrarily  small  error  probability)  is  possible.  The  type  of  channels  we 
are  considering  are  known  as  compound  channels.  We  consider  the  strategies 
(distributions)  by  the  jammer  to  be  constant  for  a  whole  codeword  as  opposed  to 
(possibly)  changing  after  each  symbol  of  a  codeword  which  would  correspond  to  am 
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arbitrarily  varying  channel.  For  compound  channels  the  capacity  with  finite  input 
and  output  is  well  known  to  be  the  maximum  of  the  minimum  mutual  information. 
The  minimum  is  over  all  possible  transition  probabilities  and  the  maximum  is  over 
all  probability  distributions  on  the  input  to  the  channel.  Thus,  using  the  maximum 
of  the  minimum  mutual  information  as  the  performance  measure  corresponds  to  the 
largest  rate  such  that  reliable  communication  is  possible  no  matter  what  strategy 
the  jammer  employs.  We  are  now  ready  to  state  the  results.  In  brief  our  results 
show  that  when  the  decoder  is  informed  of  the  quantization  rule  then  (under 
a  compatibility  assumption),  there  is  a  saddlepoint  in  cases  A  and  B,  i.e.  the 
jammer’s  rate  and  the  communicator’s  rate  are  equal  (Theorem  5).  However,  when 
the  decoder  is  not  informed  of  the  quantization  rule  then  the  jammer’s  rate  and 
the  communicator’s  rate  may  differ.  However  the  optimal  distributions,  F  from 
the  communicator’s  point  of  view  and  the  G  from  the  jammer’s  point  of  view  are 
finite  dimensional  (in  all  the  cases  A,  B,  C  and  D)  (Theorem  1).  This  converts  a 
functional  optimization  problem  into  a  finite-dimensional  non-linear  programming 
problem. 

3  Case  AI:  Decoder  Uninformed 

The  communicator  has  to  determine  the  distributions  (dG(0),dP(x))  that 
maximize  the  amount  of  information  I(G,  P\  F )  transmitted.  The  jammer  has 
to  find  the  noise  distribution  dF(z)  to  minimize  the  information  received  by  the 
decoder.  Thus,  the  quantizer’s  goal  is  to  achieve 


Tr.*  '.V  V  V*>  V  W.  ^  'A  ^  .VXVJV'A^A'^'VX'V'l'X’vr,^  '■-.’r.v.J'.v.vv  v  • 


max  min  I(G,P,F) 

dG(8),dP(i)  dF(z) 

whereas  the  jammer  wants  to  achieve 


1  v, 

■  V 

,  r 


min  max  I(G,P;F). 

dF(z)  dG(e),dP(.z) 

In  this  section  we  show  that  for  any  choice  of  strategy  of  either  player  there  is 
a  simple  characterization  of  the  optimal  reaction  strategy  of  his  opponent. 

Theorem  1:  a)  The  jammer  can  achieve  the  minimum  in  max  min  I(G,P\F) 

'  J  dG{8),  dP(x)  dF(z )  v 

with  a  distribution  concentrated  at  at  most  M(L  —  1)  +  2  points. 

b)  The  communicator  can  achieve  the  maximum  in  min  max  I(G,P;F) 

dF(z)  dG(8),  dP(x) 

with  a  distribution  concentrated  at  at  most  M(L  —  1)  +  1  points. 

Discussion:  Theorem  1(a)  says  that  the  communicator  in  trying  to  achieve 
max  min  I(G,  P\  F)  has  to  consider  only  reaction  strategies  of  the  jam- 

dG(9),  dP(x )  dF(z) 

mer  that  have  a  finite  number  of  points  of  support,  i.e.  for  each  (dG(0),  dP{x)) 
chosen  by  the  communicator  the  worst-case  jammer  distribution  may  be  assumed 
to  be  concentrated  at  a  finite  number  of  points  and  this  number  is  bounded  uni¬ 
formly  (in  (dG(0),dP(x)))  by  M(L  —  l)+2.  It  follows  that  for  a  fixed  quantizer  (i.e. 
no  randomization  of  the  quantization)  the  worst-case  jammer  is  one  who  chooses 
such  a  finite-dimensional  distribution.  Similarly  Theorem  1(b)  says  that  the  jam¬ 
mer  may,  from  his  perspective  of  trying  to  achieve  min  max  I(G,  P;  F )  , 

dF{x)  dG($ ),  dP(x) 

consider  only  finite  dimensional  reaction  strategies  on  the  communicator’s  part. 


■s. 
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To  prove  these  results  we  use  the  following  facts:  ( 1 )  the  convexity  and  con¬ 
cavity  properties  of  the  mutual  information  function  (it  is  convex  in  the  channel 
transition  matrix  and  concave  in  the  input  distribution),  (2)  the  equivalence  of 
weak  convergence  with  Levy  convergence  in  our  situation  [Hegd  87]  which  we  use 
to  show  the  continuity  of  our  objective  function  in  the  strategies  as  well  as  com¬ 
pactness  of  our  strategy  sets  (this  allows  us  to  conclude  that  there  is  a  worst  case 
jamming  strategy  and  a  best  case  communicator  strategy)  and  (3)  Dubins'  Theo¬ 
rem  in  order  to  demonstrate  that  the  optimal  reaction  strategies  are  described  by 
distributions  concentrated  on  a  finite  number  of  points.  Dubins'  Theorem  allows 


the  extreme  points  of  certain  convex  sets  to  be  written  as  finite  linear  combinations 
of  extreme  points  of  larger  convex  sets. 

Proof  of  Theorem  1: 

We  prove  part  (a)  in  detail.  The  modifications  required  to  obtain  part  (b)  are 
obvious.  We  start  by  first  proving  two  intermediate  results.  Lemmas  1  and  2. 
Lemma  1:  I{G,P\F )  is  a  Levy-continuous  functional  of  dF(z)  for  any  fixed 

■  t lG(d).dP(x)). 

Proof  of  Lemma  1: 

First  we  note  that  for  every  {dG{0).dP( j)).  I(PVX )  is  a  convex  function  of 
PyX  jCsiz  81,  pg.  50]  ,  i.e., 

HaPL  +  (T  -a)Pv2x)  <  q/(P‘)  +  H-Q)I(P.ir)  0  <  a  <  1 


piyu.ci  =  j  p\y\x.6. :)  (lG\6) 
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is  a  continuous  function  of  z  (since  p(y\x,  0,  z)  is  continuous  in  z  and  p{y\x  ,0,z)  < 
1,  this  follows  from  the  Dominated  Convergence  Theorem).  Also 

p{y\x)  =  11  p(y\x,0,z)  dG{6)  dF(z) 

Jk  Je 

=  /  P(y\x,z)  dF{z). 

K 

Hence  p(j/|x)  is  a  Levy-continuous  functional  of  dF(z)  and  therefore  Pyx  is  a 
Levy-continuous  functional  of  dF(z). 

Now  /(G,  P;  F)  is  a  convex  function  of  Pyz  and  hence  it  is  continuous  in 
the  interior  of  the  finite-dimensional  set  W  of  all  stochastic  matrices.  (Thus, 
I(G,  P;  F)  is  continuous  at  any  point  Pyr  such  that  at  least  one  row  of  Pyx  is 
not  a  one  point  distribution,  i.e.  Pyx  is  not  deterministic).  Hence,  I{G,P\  F)  is 
a  Levy-continuous  function  of  dF(z)  for  any  fixed  (dG(9),  dP(x)).  □ 

Let  S  =  set  of  all  probability  distributions  on  the  Borel  subsets  of  K,  and 

S1  =  {dF(z)  €  S  :  J  f(z)  dF(z)  =  Kj }  (5) 

be  a  hyperplane  in  S. 

Lemma  2:  I(G,P;F)  achieves  its  maximum  (minimum)  in  S1. 

Proof  of  Lemma  2: 

We  note  that  S  is  compact  in  the  Levy  topology  [Hegd  87,  Appendix  C]. 
Also  S1  is  a  hyperplane  in  S  which  is  closed  (since  dF(z)  —*  fK  f{z)dF{z) 
is  Levy-continuous)  in  the  Levy  topology. 

Hence  S1  being  a  closed  subset  of  a  compact  set  is  itself  (Levy)compact. 


Thus  Lemma  1  asserts  that  for  fixed  ( dG(0 ),  dP(x)),  I(G,  P\ F)  is  a  Levy- 
continuous  functional  on  the  compact  set  S1.  Hence  it  achieves  its  minimum  (max¬ 
imum)  at  some  point  dF'(z)  6  S1.  □ 

The  above  lemmas  are  now  used  to  complete  the  proof  of  Theorem  1. 

From  Lemma  2  we  know  that  I(G ,  P\  F )  achieves  its  minimum  in  S1.  Denote 
the  corresponding  Pyx  as  P’x  =  [p*(y|x)|  i.e. 

Pmvx  =11  p(y\x,e,z)  dG(0)  dF-(z).  (6) 

J  K  J  0 

Now  consider  the  set 

A  =  {dF(z)€  S1:  /  /  p(y\x,z,9)  dG(0)  dF(z) 

J  K  J  0 

=  P’(yk),  x  €  A,y  €  B1}  (7) 

where  j B1  =  {0, 1, . . .  L— 2}.  The  set  A  is  the  intersection  of  S  with  A/(L-1)  +  1 
hyperplanes  viz.  S1  and  the  M{L  —  1)  hyperplanes 

hy*  =  { dF{z )  €  S l  :  [  f  p(y\x,  z,0)  dG(0)  dF{z)  =  p*(y|x)}.  (8) 

J  K  v0 

Furthermore: 

S  is  convex. 

S  is  linearb/  bounded  (S  being  compact  in  a  metric  space  is  bounded  and  hence 
its  intersection  with  any  line  is  bounded). 

S  being  a  compact  subset  of  a  metric  space  is  closed  and  any  line  /  in  the  metric 
space  is  closed.  Thus  S  is  also  linearly  closed. 
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Hence  we  have  that  S  is  a  convex,  linearly  closed  and  linearly  bounded  set. 
By  Dubins’  Theorem  [Dubi  62]  we  can  conclude  that  since  A  is  the  intersection 
of  S  with  Xf(L  —  1)  +  1  hyperplanes,  every  extreme  point  of  A  is  a  convex 


combination  of  M(L  —  1)  +  2  or  fewer  points  of  S. 

From  our  construction  of  A  we  know  that  /(G,  P\  F)  is  constant  on  A.  Hence 
for  fixed  (dG(0),  dP(x))  ,  /(G,  P\F)  assumes  its  minimum  value  at  an  extreme 
point  of  A  also. 

Hence,  I(G,P;F)  assumes  its  minimum  value  at  some  point  dF(z )  which  is 
a  convex  combination  of  M(L  —  1)  +  2  or  fewer  extreme  points  of  S. 

Since  the  extreme  points  of  S  are  the  one-point  distributions,  we  can  finally 
assert  that  for  each  (dG(0),dP(x))  the  jammer  can  achieve  the  minimum  in 


max  min  /(G,  P\  F) 

dG(6),  dP(x)  dF(z) 

with  a  distribution  concentrated  at  M(L  —  1)  ■+■  2  points.  This  concludes  the 
proof  of  (a). 

For  channels  which  are  symmetric  for  each  0  and  z  i.e.  p{y\x\,z,0)  is  some 

permutation  of  p(y|ii,  z,0)  we  see  that  the  set  A  is  actually  the  intersection  of  S 

with  (L  —  1)  +  1  hyperplanes  only  and  hence  part(a)  of  the  theorem  holds  with 

(L  —  1)  +  2  =  L  +  1  instead  of  M{L  —  1)  +  2.  For  M- ary  symmetric  channels, 

i.e.  channels  with  M  inputs  and  M  outputs  and  such  that  for  each  0  and  z, 

p(y,|xj,  z,0)  =  1  —  e  and  p(y,  jx,,  z,  0)  =  -r— — -,  i  ^  j,  the  bound  on  the  number 

M  —  1 

of  points  of  support  reduces  to  3. 


For  (b)  we  note  that  the  jammer  wants  to  achieve 


min  max  I(G,P;F). 
dF(z)  dG(e),dP(x ) 

This  may  be  written  as 

min  max  C(G,F) 

dF(z)  dG(6) 

where  C(G,F)  =  max  I(G,P;F). 

dP(z) 

We  note  that  similarly  to  Lemma  1  for  any  fixed  dF(z),  C(G,F)  is  a  con¬ 
tinuous  functional  of  dG(0).  (Simply  note  that  C{G,  F)  being  the  maximum  of 
functions  convex  in  Pvx  is  also  convex  in  PV*  and  proceed  as  before).  Using  our 
hypothesis  that  p(y\x,0,z)  is  continuous  in  6  we  can  show  that 

min  max  C(G,F) 

dF(z)  dG(») 

can  for  any  dF{z)  be  achieved  by  the  decoder/quantizer  by  a  distribution  dG(9) 
that  is  concentrated  at  at  most  M(L  —  1)  +  1  points. 

Again  for  symmetric  channels  we  note  that  part(b)  of  the  theorem  holds  with 
L  instead  of  M(L  —  1)  +  1.  For  M- ary  symmetric  channels  this  number  is  2.  The 
number  of  points  of  support  is  one  less  than  Case  A  as  we  have  not  imposed  any 
constraints  on  the  distributions  dG{0)  chosen  by  the  quantizer.  □ 

3.1  Necessary  and  Sufficient  Conditions 

We  now  characterize  the  aforementioned  finite-dimensional  distributions  by 
means  of  necessary  and  sufficient  conditions.  We  first  briefly  introduce  the  neces¬ 
sary  definitions  and  results  from  optimization  theory  and  then  specialize  them  to 


Let  ft  be  a  convex  set  and  let  /  be  a  function  from  f l  into  R.  For  some 


fixed  Xo  if  for  all  x 

jj  /(( 1  ~  oQJq  +  QJ)  ~  f(x o)) 
o  lo  a 


(9) 


exists  /  is  said  to  be  weakly  differentiable  at  x0  and  the  above  limit  is  denoted 
as  f'XQ  (x),  the  weak  derivative  at  Xo-  If  /  is  weakly  differentiable  in  ft  at  x0  for  all 
Xo  in  Q,  /  is  said  to  be  weakly  differentiable  in  ft.  We  now  state  an  Optimiization 
Theorem  that  follows  from  [Luen  69,  pg.  178]. 

Optimization  Theorem:  Let  /  be  a  continuous,  weakly  differentiable,  convex- 
cap  (concave)  map  from  a  compact,  convex  set  to  R.  Let 


Then 


C  =  sup  /(x). 
x  e  o 


(10) 


1.  C  =  max  /(x)  =  /(x0)  for  some  x0  G  ft. 

2.  A  necessary  and  sufficient  condition  for  /(io)  =  C  is  f  (x)  <  0  for 
all  x  €  ft. 

Constrained  Optimization  Theorem:  [Luen  69,  pg.  217]  Let  ft  be  a  convex 
subset  of  a  linear  vector  space  and  /  and  g  convex-cap  functionals  on  ft  to 
R.  Assume  there  is  an  x\  G  ft  such  that  g(x i)  <  0  and  let 

C'  =  sup  /(*)• 
x  e  ft 
g{*)  <  ° 


(H) 


If  C  is  finite  then  there  exists  a  constant  A  >  0  such  that 


C'  =  sup  [f(x)  -  Aff(x)].  (12) 

r  €  0 

Furthermore  if  the  supremum  in  the  first  equation  is  achieved  by  xo  €  fl  and 
g(x o)  <  0,  it  is  achieved  by  x0  in  the  second  equation  and  Aj(xo)  =  0.  [Luen 
69,  pg.  217]. 

Now  given  any  dG(6)  and  the  power  constraint  we  define 

Uc(Kj,G)  =  sup  -I(G,P;F)  (13) 

F  es 

hp  <  Kj 

where  hF  =  Sk  f(z)  dF(z).  To  simplify  notation  we  define 

D  :  S  -  R  by  D(F)  =  /  f{z)dF(z)  -  Kj.  (14) 

J  K 

Using  the  Constrained  Optimization  Theorem  we  will  infer  in  Theorem  2  that 
there  exists  a  non-negative  constant 


A  =  A (G,  Kj)  for  D(F)  <  0  such  that 


UC(G,  Kj)  =  sup  [-/(G,P;F)  -  AZ?(F)].  (15) 

f  e  s 

VVe  now  formulate  necessary  and  sufficient  conditions  for  the  characterization  of 
the  optimal  distributions  of  Theorem  1  in  the  following  two  theorems. 

Theorem  2:  Uc(G,Kj)  is  achieved  by  a  distribution  F0  6  S  satisfying  D(F)  < 
0  and  a  necessary  and  sufficient  condition  for  UC(G ,  Kj)  =  —  /(G,  F;  F0)  is  that 
for  some  constant  A  >  0 

/  H(x;G,F0)  -  A  f(z)}dF(z)  <  -7(G,F;F0)  -  A  Kj  (16) 

J  K 
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where  i(z;G,F0)  =  T,x,yp(x)  p(y\x,  z)  log 


Proof  of  Theorem  2: 


for  all  F  €  S 
/p(y|i,z)  dF0  (z) 

p(x )  /p(ylx^)  <^o(z) 


D  :  S  — ►  R  is  clearly  linear,  bounded,  convex-cap,  continuous  and  weakly 
differentiable  in  S  with  D'Fl  (F2)  =  7)(F2)  —  D(Fi).  By  choosing  Fj  as  a 
distribution  with  unit  mass  appropriately  we  can  infer  that  D(F\)  <  0. 

Next  we  show  that  I(G,  P\  F)  is  convex  in  F. 

7(G,F;aF1  +  (l-a)F2)  =  7(Fvt  (G,aFt  +  (1  -  o)F2)) 

=  1  (L<  Jq  P^X'  Z ^  ^ ^ 1  +  (*  ~  Q)dF^)) 

=  I(aTyx(G,Fl)  +  (1  —  a)  ~Pyx  (G\  F2)) 

=  +(l-a)7^) 

<  al(Pyx)  +  (1  —  a)  l(Pyx) 

(by  the  convexity  of  7(.)  w  r.  t  Pyx) 

=  aI(G,P;F\)  +  (1  —  a)  7(G,  P;  F2).  (17) 

Then,  since  Uc(G,Kj)  is  finite  we  can  infer  from  the  Constrained  Opti¬ 
mization  Theorem  that  there  exists  some  constant  A  >0  such  that  Uc  = 

sup  [—I(G,  P;  F)  -  A£»(F)]. 

F  €  S 

Now  \  show  that  I(G,P\F)  is  weakly  differentiable  at  all  F  e  S. 

Let  L(a)  =  I(G,  P;  aFx  +  (1  — q)F2).  Since  7(G,  F;  F)  is  convex  in  F  ,  L(a) 

£(<*) _  m  .  .  , ,  , 

is  convex  in  a.  Therefore  -  is  non-decreasing  in  a  and  bounded 

a 

from  below  and  thus  lim  — — - — exists.  Furthermore 

or  1  0  a 
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Lemma  3:/^  (G,P;F2)  =  f  i(z;G,Fl)dF2{z)  -  I(G,  P\  Fx). 

Proof  of  Lemma  3: 

See  Appendix  A. 

We  now  have  that  I(G,  P\  F)  —  A  D(F)  is  convex-cap,  continuous  and  weakly 
differentiable  in  F.  Thus,  by  the  Optimization  Theorem  there  is  a  distribution 
function  F0  G  S  such  that  Uc{G,Kj)  =  I(G,P;F0)  -  XD{F0).  The  necessary 
and  sufficient  condition  becomes 

-  I'G,  P\  Fq'  (F)  -  \  D'Fo(F)  <  0  for  all  F  €  S  (18) 

or 

/  [-*(z;G,Fo)  -  A f{z))dF(z)  <  -/(G, P-  F0)  -  \hFo.  (19) 

J  K 

If  h f0  <  Kj  the  power  constraint  is  trivial  and  the  constant  A  is  zero  i.e. 
D(F0)  <  0  but  A  D  {Fq)  =  0.  Thus  the  necessary  and  sufficient  condition  is 
established.  □ 

From  Theorem  1  we  know  that  it  is  possible  to  find  Fq  from  the  set  of 
distributions  with  a  finite  number  of  points  of  support.  Finding  such  an  Fq 
entails  determining  the  set  of  points  of  increase  as  well  as  the  amounts  of  increase 
of  Fq  at  those  points.  Let  Eq  denote  the  set  of  points  of  increase  of  Fq.  We 
now  show 

Theorem  3:  Let  Fq  be  a  probability  distribution  satisfying  the  power  con¬ 

straint.  Then  Fq  achieves  (JC(G,  Kj )  iff  for  some  A  >  0 
Cl )  -i(z;  G,  F0)  <  -I(G,  P ;  F0)  +  A(/(z)  -  Kj) 

for  all  z  €  K. 
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C2)  -i(z;  G,  F0)  =  -7(G,  F;  F0)  +  A (f(z)  -  Kj) 


for  all  z  €  Eq. 


Proof  of  Theorem  3: 


The  sufficiency  is  clew  because  if  both  conditions  Cl  and  C2  the  conditions  of 
Theorem  2  hold.  We  show  the  necessity. 

Assume  that  F0  is  “optimal”  but  Cl  is  not  true.  Then  there  must  exist  some 
z\  €  K  such  that  — i(z;G,  F0)  >  -/(G,  P;F0)  +  A (f(z)  —  Kj).  Let  Fx(z) 
be  a  probability  distribution  with  a  unit  increase  at  such  a  point  zx  €  K.  Then 

/  [-*(r;G,F0)  -  A  f(z)]dFx(z)  >  -7(G,P;F0)  -  A  Kj  (20) 

J  K 

which  contradicts  Theorem  2.  Hence  Cl  must  be  true. 

Now  assume  that  F0  is  “optimal”  but  C2  is  not  true.  Then  since  Cl  is  true 
—i{z\  G,  Fo)  <  -7(G,P;F0)  +  A(/(z)  -  Kj)  for  all  x  in  E'  where  E'  is 
some  subset  of  F0  with  positive  measure,  i.e. 

/  d  F0(z)  =  c  >  0.  (21) 

JE' 

Since  Se^-E'  dFo(z)  =  1  —  c  and  on  Fo  —  E' 

i(z;  G,  Fo)  =  7(G,  P;  F0)  +  A (f(z)  -  Kj)  (22) 

and 

/  [i(z;G,F0)  -  \F(z))dF0(z)=  f  [i(z;G,  F0)  -  Xf{z)}  dF0(z) 

JK  JE' 

+  /  (»(*;G,Fo)-A(/(z)]dF0(z) 

JEo-E' 


» 


I 


/  li(z-,G,F0)-\(f(z)}dF0(; 

Jh'-Ec 


we  have 


—  I(G ,  P;  F0)  —  \K j  <  —I(G,P\F0)  —  XKj  i.e.  a  contradiction.  (23) 


Hence  C2  must  be  true  too.  □ 

Theorems  1  and  3  reduce  the  calculation  of  the  distributions  describing  the 
reaction  strategies  to  finite-dimensional  non-linear  programming  problems.  They 
can  be  used  to  simplify  the  search  for  conservative  strategies  which  are  optimal  for 
either  player.  In  Theorem  4  below  we  assert  the  existence  of  conservative  strategies 
for  each  player. 

Theorem  4:  For  the  game  described  in  Case  AI,  there  exists  a  conservative 

strategy  ( dG(9 ),  dP(x))  for  the  communicator  and  a  conservative  strategy  dF(z ) 
for  the  jammer,  i.e.  strategies  such  that 


i)min  l(U,P\F )  =  max  min  I(G,P\F )  and 

dF(z)  iP(x)dG(9)  dF(z) 


ii)  max  !!G,P\F)  =  min  max  I{G,P\F ) 

dP{i),dG(e )  dF(z)  dP(x),dG(9) 


Proof  of  Theorem  4: 


From  Lemmas  1  and  2  we  note  that 


a)  /(G,  P;  F)  is  lower-semicontinuous  in  dF(z )  for  each  ( dG(0),dP(x ))  and 


r./y 


b)  There  exists  (dG(9),  dP(x))  3l(G,P\F)  is  lower  semi-compact  in  dF(z). 

Theorem  4(i)  now  follows  from  a  fundamental  existence  theorem  [Aubi  82,  pg  209, 
Th.  1],  Theorem  4(ii)  follows  similarly.  □ 

3.2  The  Remaining  Cases 

Case  BI:  With  F(z)  now  recognized  as  a  one-dimensional  distribution  Theo¬ 
rems  1  and  2  are  easily  seen  to  be  true. 

M 

Case  Cl:  We  redefine  S  as  follows:  S  =  □  Li  where  Li  is  the  space  of 

t  =  1 

product  distributions  such  that 


Pr(Z,  >0)  >0 
Pr(Z}  =  0)  =  lj  #i. 

By  previous  arguments  each  Li  is  Levy  compact  and  hence  so  is  S.  Now  the 
proofs  of  Th.  1  and  Th.  2  follow  as  before. 

Case  DI:  We  perform  the  analysis  by  fixing  D  —  1  of  the  D  distributions 
dFi, . . . ,  dFp ■  By  minor  modifications  in  the  proof  of  Lemma  1  we  see  that  I(X;  Y) 
is  a  Levy  continuous  functional  of  dFi(z)  for  each  i.  Defining  S  and  S1  similarly 
except  that  now  both  are  spaces  of  distributions  of  dF,(z,)  instead  of  dF(z)  we  see 
that  for  each  (dG(9),dP(x))  the  jammer  can  achieve  the  minimum  in 

max  min  I(G,  P\  F)  (26) 

(dG(i),dP(x))  dF(t)=dF,(z1),dF,(x7)....dFD(zD) 
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with  a  distribution  dF,  concentrated  at  at  most  M(L  —  1)  +  2  points. 

Since  i  is  arbitrary  we  can  assert  that  the  jammer  can  achieve  the  minimum 

in  (16)  with  distributions  dF,,  i  =  1 . D  each  of  which  are  concentrated  at  at 

most  M(L  —  1)  +  2  points.  Part  (b)  of  Theorem  1  and  Theorem  2  are  easily  seen 
to  be  true  as  stated  for  this  case. 

4  Case  All:  Decoder  Informed 

We  have  an  arbitrary  joint  distribution  on  Z\,...,Zd  the  jammer  chooses 
dF(z)  and  knows  that  the  decoder  knows  0.  The  communicator  chooses  dG(0) 
and  further  the  decoder  knows  9. 

In  this  case  we  make  a  “compatibility”  assumption,  that  is,  for  every  9  and 
dF{z)  the  capacity-achieving  input  distribution  dP(x)  remains  the  same. 

While  “compatibility”  certainly  restricts  our  model  applicability,  we  show  by 
example  that  it  is  often  a  worst-case  assumption.  For  instance,  we  know  [Dobr 
59]  that  if  \1  =  L  and  if  the  jammer’s  strategy  set  is  restricted  such  that  for  each 
distribution  dF(z)  and  quantizer  9,  Prob  {  error|x  }  <  e  for  every  j,  then  the 
saddle-point  strategy  for  the  jammer  is  to  choose  a  distribution  such  that 

p{y |x)  =  ~~  for  all  y,x  if  e  >  1  -  jj 
and 

f 

p(y|*)  =  it  _  i  y  t  x  if  e  -  1  ~  if 

=  1  -  e  y  =  x 

and  the  saddle-point  strategy  for  the  communicator  is  to  choose  a  uniform  dis- 


tribution  on  the  input  alphabet.  In  our  model  this  corresponds  to  choosing  the 
canonical  noise  variables  so  that  p(y\x,6)  is  a  symmetric  channel  for  each  9.  Such 
symmetry  (and  thereby  “compatibility”)  is  obtained  in  a  number  of  other  situa¬ 
tions  as  a  saddle-point  strategy.  Under  certain  conditions,  when  we  have  convex 
constraints  in  the  M  noise  variables  affecting  the  M  inputs  of  the  channel  which 
are  invariant  under  any  permutation  of  the  M  variables  (i.e.  a  “symmetric”  con¬ 
straint)  then  the  choice  of  a  uniform  distribution  on  the  input  and  the  choice  of  a 
symmetric  channel  are  saddle- point  strategies  for  the  communicator  and  the  jam¬ 
mer  respectively  (see  Appendix  B).  To  describe  one  more  example,  if  we  have  M 
inputs  and  M  outputs, 

y.  =  m  i  =  1,  — ,  Af  i±j 
Vj  =  A  +  rij  i  = ;, 

n,  are  N(0,v,),i  =  l,...,Af  independent  random  variables  and  there  is  further 
the  constraint  YlxLi  vi  =  ci  then,  from  arguments  similar  to  those  in  Appendix  B, 
it  can  be  seen  that  the  saddle  point  strategy  is  to  choose  v,  =  jj  and  a  uniform 
distribution  on  the  input. 

Utilization  of  the  “compatibility”  assumption  allows  us  to  write  the  above  as 

max  EG(C{9,F)). 

dF{z)  dG(9) 

and 

max  min  Ea(C(6,  F)) 

dG(6)  dF(i) 


’-vw.*  '.w  v r»>'*>TVTr»»v 


sua. 


where  C(6,  F)  =  max  1(6;  F)  and  1(6;  F)  =  I(X;  Y\6 ). 


In  this  section  we  prove  the  existence  of  a  saddlepoint.  The  main  result  is 


stated  in  the  following  theorem: 


Theorem  5:  There  exists  a  pair  of  distributions  dGm(6),dF'(z))  such  that 


Ea(C(6,F *))  <  Eg.(C(6,F‘))  <  Ea-(C(6,F )) 


for  all  feasible  dG(6),  dF(z),  i.e.,  (dGm(6),dF*(z))  is  a  saddle  point  for  the  game 


in  case  All. 


Proof  of  Theorem  5:  The  set  of  all  feasible  dF ’s  i.e. 


(dF(z)  :  f  f(z)dF(z)  <  Kj)  0  <  *,  <  b{ 

J  K 


is  clearly  convex  and  compact.  The  set  of  all  dG ’s  is  also  convex  and  compact. 


We  note  that  for  any  fixed  dF(z),C(6,F)  is  a  continuous  function  of  6. 


p(y  |  x,6)  =  f  p(y\x,6,z)dF(z) 

J  K 


is  by  our  earlier  arguments  a  continuous  function  of  6. 


Hence,  Pyx(6 )  is  a  continous  function  of  6.  Also  C(6,  F)  =  C(Pyx(6))  and 


we  know  that  C(Pyx(6))  is  convex  in  Pyx(0)- 


Therefore,  for  every  6  G  0  3  Pyx(6 )  is  not  deterministic,  C(Pyx(6))  is  a 


continuous  function  of  Pyx(6).  H-nce,  for  fixed  dF(z),C(6,  F)  =  (C(Pyx(6)))  is 


a  continuous  function  of  6  and  so 


Eg(C(6,F))=  i  C(6,F)dG(6) 

J9 


& 


/vvj  j  v.  ox*  iv.v  ts.f.  <>;<  w 


is  a  Levy  continuous  functional  of  dG(6). 

Since  Eg(C(0,  F))  is  linear  it  is  also  a  concave  function  of  dG{0)  in  dG(0)). 
Next  we  note  that  C(0,  F)  is  convex  in  dF(z)  for  each  0  since  C(6 ,  F)  =  C(Pyx(9)). 
Hence 

C{0,  aF1  +  (1  -  a)F2)  <  aC{6,  F1)  +  (1  -  a)C{0 ,  F2)  0  <  a  <  1. 

Taking  expectations  w.r.t.  G 

f  C{0,aFl  +  (l  -  a)F2)dG(0) 

J  8 

<  /  ( aC(0 ,  F1)  +  (1  -  a)C(0,  F2))dG{0) 

..  FG(C(«,aF1  +  (l-a)F2) 

<  aEG(C(0,  F1))  +  (1  -  a)EG(C(0,  F2)). 

Consequently,  Eg(C(0 ,  F))  is  a  convex  function  in  dF(z). 

Also  Eg{C(9,  F))  is  Levy-continuous  in  dF{z).  To  prove  this  it  suffices  to 
show  that  for  any  sequence  Fn  converging  to  F  in  the  Levy  metric 

EG(C(0,Fn))  ^  Ea(C(0,F)). 

Since  convergence  in  the  Levy  metric  is  in  our  case  equivalent  to  weak  conver¬ 
gence  [Hegd  87,  Appendix  C]  it  suffices  to  show  this  for  Fn  F.  However, 


lim  EG(C(9,Fn)) 

71 


=  lim 

n 


/  C{0,  Fn)dG 

Je 
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=  /  lim  C(9,Fn)dG  (by  the  Dominated  Convergence  Theorem) 
Jq  n 

=  f  C(9,F)dG  (since  C(9 ,  F)  is  Levy  —  continuous  in  F) 

Je 

=  Eg(C(9,  F)) 


which  proves  Levy- continuity  in  dF(z).  From  these  properties  of  the  objective 
function  and  the  convexity  and  compactness  of  the  feasible  strategy  sets  we  recog¬ 
nize  that  the  hypotheses  of  the  Sion  minmax  theorem  of  game  theory  are  satisfied 
[Aubi  82,  Th7,  pg  218].  This  concludes  the  proof  of  Theorem  3.  □ 

We  note  that  these  saddle-point  distributions  need  not  have  finite  support. 
However,  in  this  case  we  have  an  equlibrium  and  with  no  further  knowledge  of 
each  other’s  choice  of  strategy,  the  jammer  and  the  quantizer  should  be  content 
utilizing  dGm(9)  and  dF*(z). 

Using  the  Optimization  Theorem  and  the  Constrained  Optimization  Theorem 
we  can  derive  necessary  and  sufficient  conditions  at  these  saddle  points.  Given  any 
dG(9)  and  the  power  constraint  we  define 


Uc(Kj,G)=  sup  -Eg(C(9,F)) 

Fes 

hF  <  Kj 


(28) 


and  given  any  dF(z)  we  define 


Ve(F)  =  sup  Ea(C{9,F)) 

G  eQ 


(29) 


where  Q  is  the  space  of  distributions  on  0.  Then  we  have 

Theorem  6:  The  saddle-point  strategies  dF*,dG*  satisfy  to  the  following  inequal- 
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Eg.(J(-1(z;6,F‘)  -  A f(z))dF(z))  <  Eg.(-C(0,  F*))  -  A Kj 


for  some  A  >  0,  for  all  F  where 


/,  ™  A  V-  n,  w  |  m  ,  f  p(y \x,z,0)dF{z) 

i(z\ 0,  F)  =  2^P(x)p {y\x,z,0)\og  — 5  - HT7FT7 

7Z  L.xP{x)ip{y \x,z,0)dF{z) 


Eg(C(0,F-))<Eg.(C(O,F-)) 


for  all  G. 


Proof  of  Theorem  6: 

For  any  F  denote  the  weak  derivative  of  EG(C(0 ,  F))  at  G0  as  DGo(EG(C(0,  F)) 
and  for  any  G  denote  the  weak  derivative  of  EG(C(0 ,  F))  at  F0  as  DFo(EG(C(0,  F)). 
Using  Lemma  3  and  the  Dominated  Convergence  Theorem,  we  have 

DF>(Ea(-C(0,F2))  =  Eg(—  J  i(z\0,  F\)dFi)  +  Eg(C(0,Fx))  (32) 

for  any  Fi ,  F2 . 


DGl(EGi(C(0,F)  =  EGj(C(0,  F)  -  ECl(C(0,F).  (33) 

Now  letting  Fi  =  Fm,G\  =  G*  in  the  first  equation  we  have,  using  the  Con¬ 
strained  Optimization  Theorem  and  the  Optimization  Theorem  and  the  properties 
of  EG(C(0,  F))  as  in  Theorem  2,  that  a  necessary  and  sufficient  condition  for  F* 
to  achieve  Uc(Kj,G *)  is 

Eg.(-J  (J(z:  0 ,  D  -  A  f(z))dF(z))  <  Eg.(-C(0 ,  F*))  -  A  Kj  (34) 


for  some  A  >  0,  for  all  F . 

Letting  F\  =  F',G\  =  Gm  in  the  second  equation  gives  us  similarly  that  a 
necessary  and  sufficient  condition  to  achieve  Vc(Fm)  is 

Eg(C(0,F'))  <  £g.(C(0,F*))  (35) 

for  all  G. 

Since  at  a  saddle-point  U c{Kj,G')  and  Vc(Fm)  are  simultaneously  achieved, 
the  theorem  follows.  □ 

4.1  The  Remaining  Cases 

Case  BU:  Theorem  3  holds  with  F(z)  as  a  one-dimensional  distribution. 
Case  CII:  Although  S  is  compact,  it  is  not  convex  and  so  we  cannot  demonstrate 
that  there  is  a  saddle  point  strategy. 

Case  DU:  Again  we  have  that  EG{C{6,  F))  is  a  Levy  continuous  functional 
of  dG{6)  and  is  concave  in  dG(0).  Also  EG(C(9,  F))  is  Levy  continuous  in 
(dFx(z), ....  dFo{z))-  However  EG(C(9 ,  F\, . . .  FG))  is  not  convex  in  (f\, . . . ,  Fd )• 
Hence  we  cannot  assert  the  existence  of  a  saddle  point  in  this  case. 

4.2  Fixed  Quantizer 

Before  concluding  this  section  we  also  point  out  that  if  we  did  not  have  ran¬ 
domized  quantization  then  without  “compatibility”  the  game  would  have  a  saddle- 
point  where  the  jammer's  saddle-point  distribution  need  be  concentrated  at  at  most 
\f(L  -  1)  +  2  points.  We  summarize  this  in  Theorem  7. 


Theorem  7:  For  any  quantizer  0,  there  exists  a  pair  of  distributions  dP'(x),dF’(z) 
such  that 

1(0,  P,  Fm)  <  1(0 ,  P\  F")  <  1(9,  P\  F)  (36) 

for  all  feasible  dP,dF.  Moreover  dFm(z)  can  be  chosen  to  be  concentrated  at  at 
most  M(L  —  1)  +  2  points  and  necessary  and  sufficient  conditions  for  dFm(z)  and 
dP’(x)  are  for  some  Aj,  A2  >  0 

-i(z-,0,F')  <  —1(0,  P“,F“)  +  \\(f(z)  —  Kj)  (37) 

for  all  z  €  K  and 


-  i(z- 0,  D  =  -1(0,  P\  F’)  +  A ,(/(z)  -  Kj)  (38) 


for  all  z  €  Eq  where  i(.; is  as  defined  in  Theorem  2  with  G  concentrated  on  0. 
Also 

It(0,P\F’)  =  A2  (39) 

for  all  x  3  Pm(x )  >  0  and 

h(0,P',F')<  A2  (40) 


for  all  x  3  P'(x)  =  0  where 


Ir(0,Pm,F-)± 


^p(y|i,0)log 

v 


p(y\xJ) 

Lr  P’(x)p(y\x<9)' 


Proof  of  Theorem  7: 

From  the  proof  of  Theorem  5  we  know  that  all  we  need  to  show  is  that  1(0,  P,  F) 

W 


is  (Levy)  continuous  in  dP(x).  We  show  this  by  considering  any  sequence  dPn(x ) 


dP(x)  and  showing  I(9,Pn,F)  -*  1(9,  P,F).  Since  x  belongs  to  the  finite  set  A, 
weak  convergence  is  equivalent  to  convergence  in  any  finite-dimensional  metric. 


Now 


m  P.,  F)-I(t,P,F)\  =  Ij:  P„(x)p(s,|x,  t)  log 

Ex^n(x)p(y|*,0) 


x-V 


- E P(*)p(y\*’D) io«  =  ..i 

Ex  P(x)p(y\x,9) 


Z'il 


<  I  E  ^(x)p(y|x,<9)ioS  —0ML  — 

LIPn(x)p(y\x,9) 


x.y 


-  Z  fi>( *)p(yl*,  A)  log  p!y [X,/i  ^  I 

L.xP(x)p(y\x,e) 


r.v 


+  I  n  Pn(j)p(y|x,  0)  log  —  pl~[X’/r 


*.V 


-  L  ^(x)p(y|x,  0)  log  — -  p 


r.v 


<lEW^,<.)inog^^>i 


I-V 


where  D  =  maxIi(  p(y|i,  9)  log 


+  'LD\P»(x)-P(x)\ 

X 

p(y 1j,0) 

E*p(yM) 

ZxPn(x)p(y\x,9) 


<  LD |  log 


T.zP(x)p(y\x,9) 
Fj2D\Pn(x)-P(x)  |. 


(41) 


(42) 
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Again  since  A  is  finite  we  cam  say  that  for  all  6  >  0  3 N  such  that  for  all  n  >  N 

1  -  6  <  <1+6  Vr  6  A 

P(x) 

,  c  ^  Pn{x)p(y\x,9)  ^  t  ,  c  w  ,  , 

1  -  °  <  s,  r~7  I — ST  <  1  +  <^  VI  G  A 
P(i)p(y|i,^) 
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(43) 


,  c  ^  Er-Pn(i)p(y|i.^)  „  ,  .  f  ,,  .  , 

1  -  <5  <  ~  n/  \  /  | <  1  +  <5  Vx  €  A. 

Y.zP(x)p{y  l*»0) 

By  the  continuity  of  the  log  function  we  can  say  that  Ve  >  0  36  >  0  3 


T,zpn(x)p(y\x,0) 
£r  p(x)p{y\x,0) 


<  €• 


The  second  term  in  (41)  can  also  clearly  be  made  <  e  for  sufficiently  large  n. 
Thus  the  continuity  of  1(9,  P,F)  w.r.t.  P  is  confirmed  and  the  first  part  of  the 
theorem  follows.  The  bound  on  the  number  of  points  of  support  of  dF ’  follows 
from  Theorem  1(a).  The  necessary  and  sufficient  conditions  are  derived  as  before 
from  Theorem  3  and  well-known  results  about  channel  capacity  [Gall  68,  pg.9 1] . 
□ 


5  Conclusions 


We  have  constructed  fairly  general  channel  models  which  are  capable  of  repre¬ 
senting  a  number  of  jamming  situations.  The  jammers  we  have  considered  have 
all  been  non-adaptive  and  using  results  from  the  compound  channel  we  are  able 
to  give  operational  significance  to  our  minimax  performance  measures, i.e.  we  can 
assert  the  existence  of  encoders  and  decoders  which  can  perform  at  arbitrarily  low 
probabilities  of  error  at  rates  close  to  our  performance  measures.  Our  analysis  is 
also  clearly  applicable  to  many  restrictions  on  the  jammer’s  strategy  set  other  than 
the  ones  we  have  considered. 

In  the  case  with  the  decoder  uninformed  (case  I)  we  have  shown  that  the  worst- 
case  jammer  strategy  (as  well  as  best  communicator  strategy)  needs  only  be  one  of 


the  class  of  distributions  with  finite  support.  We  have  a  bound  on  the  number  of 
these  points  of  support  in  terms  of  the  sizes  of  the  input  and  the  output  alphabet. 
Thus  we  have  reduced  the  computation  of  the  worst  case  jamming  strategies  to  a 
finite-dimensional  non-linear  programming  problem.  Moreover  we  can  characterize 
these  distributions  by  necessary  and  sufficient  conditions  which  are  fairly  easy  to 
test. 

In  the  cases  with  the  decoder  informed  we  reduce  the  communicator’s  strategy 
set  (either  by  using  the  “compatibility”  assumption  or  by  fixing  a  quantizer)  .  In 
this  case  when  we  have  convexity  with  respect  to  the  jammer’s  strategy  (as  in  cases 
All  and  BII)  we  are  able  to  demonstrate  the  existence  of  a  saddle-point  strategy. 
For  the  case  with  non-randomized  quantization  we  are  further  able  to  characterize 
these  saddle-point  strategies  using  the  earlier  theory. 

As  we  have  mentioned  earlier  all  the  above  presupposes  non-adaptive  jamming. 
The  compound  channel  model  which  we  use  indirectly  by  our  choice  of  objective 
function  is  appropriate  to  use  in  this  case.  We  can  allow  for  more  sophisticated 
jammers  if  we  incorporate  the  cases  where  the  jammer’s  strategies  are  allowed  to 
depend  on  the  previous  (and  present)  channel  inputs.  The  appropriate  channel 
model  to  use  then  is  that  of  the  arbitrarily  “star”  varying  channel  (A' VC  )  [Csiz 
81,  pg.233] .  This  model  generalizes  the  arbitrarily  varying  channel  (AVC)  and 
includes  it  as  a  special  case.  It  is  known  that  the  m-capacity  (i.e.  capacity  with 
maximum  probability  of  error  over  all  the  codewords)  of  the  A*VC  is  the  same  as 
that  of  the  corresponding  AVC  [Csiz  81,  pg.232].  This  capacity  is  known  for  the 


case  of  binary  output  alphabet  (and  finite  input  alphabet)  and  is  known  to  be  equal 

to  max  min_/(JY;  Y)  where  X  and  Y  are  the  input  and  the  output  respectively,  W 
dp(i)  wew 

is  any  channel  chosen  from  the  set  of  channels  W  and  W  is  the  row-convex  closure 
of  W  [Csiz  81  j.  In  our  case  the  jammer’s  strategy  set  corresponding  is  already 
row-convex  closed  and  hence  the  appropriate  programs  would  be 
a)  For  the  communicator: 


b)  For  the  jammer 


max  min  7(G,  F) 

(dG($),dP(z))  dF(z) 


min  max  I[G,F) 

dF(z)  (dG(9),dP(x)) 


which  is  the  same  objective  function  as  we  have  used.  Similarly,  in  the  case  with 

decoder  informed  we  would  obtain  the  same  objective  functions.  Thus,  all  the 

results  derived  in  the  previous  chapter  for  the  case  of  mutual  information  can  be 

extended  to  the  case  of  the  AmVC  channel  with  binaxy  output.  This  model  may 

be  viewed  as  a  worst-case  representation  of  adaptive  jamming.  Unfortunately  the 

m-capacity  of  the  A  VC  is  as  yet  unknown  for  output  sizes  greater  than  2.  On  the 

other  hand  the  a-capacity  of  the  AVC  (i.e.  the  capacity  with  average  probability 

of  error)  is  known  to  be  either  0  or  else  max  min  I(X\  Y)  where  W  is  the  convex 

dP(x)W  gW 

closure  of  the  set  W  to  which  W  belongs  [Csiz  81,  pg.214].  Since  in  our  model 
the  set  of  channels  is  convex  as  well  as  row-convex  the  a-capacity  is  known  to  be 
greater  than  0  iff  the  m-capacity  is  greater  than  0  [Ahls  78].  Thus  with  average 
probability  of  error  whenever  the  jammer’s  strategy  set  is  such  that  he  cannot  force 
the  capacity  to  be  0  then  ail  the  results  of  the  preceding  chapter  extend  to  the 
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Appendix  A 


Lemma  3  :  (G;  F2)  =  fi(z;G,Fi 

where  i(z;G,Fi)  =  £*,„  p(x)p(y|x,  *)log 

Proof  of  Lemma  3  : 


)dF2(z)-I(G;Fl) 

(  \ 

I  p(y\x,  z)dFi 

£p(x)f  p(yjx,z)dF1 

\  r  / 


I'Fl(G]F2)  =  lim  —  jx^p(x)  (//  p(pI*,M)[(1  -a)dF^  +  adF2\dG{9)) . 

a  l  x.y 

( Sfp(y\x,z,6)[(l  -  a)dFx  +  adF2]dG(0)) 
X]p(x)(//p(pIx'2^)[(1  -  ct)dF\  +  adF2]dG{9) 

X 

-Y,p(x)(ffp(y\*^,9)dFsiG(e)). 

x,y 


,  jU  p^z^dF^dGW) 

X 

Denoting  / p(y|x,  z ,  9)dG{9)  as  p(y|x,  z) 


If,(G;  F2)  =  Hm  ^  |x]p(x)[/p(y|x,2)[(l  -a)dFx  +  adF2 

a  l  x,y 

Jpiyjx,  2)[(1  -  a)dFx  +  a  dF2] 


log 


X^p(x)  /  p(yl*»2)[(i  -  +  <*dFi 


=  lim  —  < 

orlO  a 


X>(x) 


*.v 


J  p(y\x,z)adF2\og 


( 


~  J  p{y\x,z)adFl\og 


( 


x)f  p{y\x,z)dF1 

p(y\x,  z)[(l  -  a)dFx  +  adF2 ] 
X^P(^)/p(yk^)[(l  -  a)dFx  +  a  dF2 

X 

\ 


Jp(y\x,  *)[(!  -  a)dFi  +  adF2 } 
Hp(x)/P(ylx<  2)[(1  “  a)dFi  +  adF2] 

\  X 


39 


+  lim-  XX1)  f  p(y\x,z)dFl\og 
ol°  Q  7%  J  2>( 


/p(y[*.*)[(l  -  ct)dFj  +  a dF2) 
p(x)/p(y|*,-)[(l  -  ct)dFl  +  adF2] 


-  fp(y\z,z)dFliag(=fMx'z)dF' 

*  I  J2p(x)fp(y\x'z)dFi 

\  x  )  . 


=  a  +  6(say). 


By  choosing  a  sequence  an  |  0  and  using  weak  convergence  of  (1  -  an)dF\  + 
ctndF2  to  dFi 


a  =  j  xiz-G,  F:)dF,  -  l(G,Ft) 


6=f  /  p(j|x,*MF,log  _£p(pIi-  Z)K1  -  a>yf.  + 

da  *.v  d  Z]p(x)/p(ylI'z)[(1  -  ajdFi  +  adF2 


at  a  =  0 


Taking  the  derivative 


h  =  52p(z')  [ P(y\x\z)dFl  <  — 


£p(*)/p(y|z'i*)((l  -  q )dFx  +  adF2 } 

X 

fp(y\x ,  *)[(!  -  a)dFl  +  adF2] 


■—  / P(ylx-x)[(1  ~  a)dFi  +  adF2]j  / p(y|x',x)(dF2  -  dF,) 

~  J  P(ylx  >  z)[(l  ~  cx)dF i  4-  adF2\  P(*)  J  p(y|x,x)(dF2  -  dF^  | 

where  d  =  p(x)/p(yjx.  ^)[(1  -  a)dFi  +  adF2 

X 

After  some  algebraic  manipulation  it  can  be  shown  that  6  — ►  0  as  a  1  0. 


vv  w*  yr 


Appendix  B 


Here  we  consider  a  communication  game  with  two  players,  player  A  who  chooses 
an  input  distribution  r  on  the  A/-ary  input  alphabet,  and  player  B  who  chooses 
the  M  x  L  transition  probability  matrix.  Let  X  and  Y  denote  the  input  and 
output  random  variables  respectively  and  let  n,  denote  the  distribution  of  the 
random  variable  associated  with  the  conditional  density  p(y|i,).  Let  the  set  of  all 
feasible  a’s  (=  (nt, . . . ,  tim))  be  compact.  The  channel  p(y|x)  is  a  function  of  a  (= 
(nt,...,njvr)).  Assume  this  function  is  linear  and  that  for  a  choice  of  n,  =  n,  i  = 
1, . . . ,  M  the  channel  chosen  is  symmetric.  Let  I(r,  a)  =  J(X;  Y)  when  A’s  choice 
is  r  and  B’s  choice  is  a-  Let  be  constrained  by  /,(ni, . .  • ,  mm)  <  Ci 

i  —  1 . c  where  /,  is  a  convex,  symmetric  function  of  ni,...,njv/,  i.e.  /,  is 

invariant  under  any  permutation  of  nj, . . . ,  n\f.  Then  a  saddle- point  strategy  exists 
for  both  players  and  for  player  A  it  is  to  chooose  a  uniform  distribution  on  the 
input  and  for  player  B  it  is  to  choose  ail  the  components  of  a  equal,  i.e.  there 
exists  al  with  all  its  components  equal  such  that 


/(r,fll)</(r-,a:)</(r\a) 


where  r*  corresponds  to  the  uniform  input  distribution. 

Proof:  Step  1:  /(r, a*)  <  /(r*, a*) 

This  follows  from  the  fact  that  the  mutual  information  between  the  input  and  the 
output  of  a  symmetric  channnel  is  maximized  by  the  uniform  distribution. 

Step  2:  /(r*,a*)  <  /(r*, a) 


Since  I(X\  Y)  is  a  convex  function  of  p(y\x )  which  is  linear  in  a,  7(r,a)  is  convex 
in  a-  Moreover,  given  the  form  of  the  constraints  the  set  of  feasible  a’s  is  a  convex 
set. 

Now  for  any  e  >  0  let  inf  7(r*,  a)  +  e  be  achieved  at  some  ai  ^  r"  .  Then  we 
show  7(r*,a*)  <  I(rm,R j)  proving  that  the  minimum  is  also  achieved  at  r*.  The 
use  of  a  uniform  distribution  on  the  input  and  the  symmetry  of  the  constraints 
implies  that  for  any  permutation  of  n.i  (a?  say)  we  have  a  new  channel  p0,(y|i) 
which  involves  just  a  relabelling  of  the  inputs  of  the  original  channel.  The  mutual 
information  7(r*,ai)  is  equal  to  7(r*,af).  Now  consider  all  the  M!  permutations 
of  ai  =  0.x  -  a  €  T  (  all  the  permutations  are  not  distinct  but  it  does  not  matter). 
Take  the  convex  combination  j^iHa€Taf'  =  a«(say).  Every  component  of  a*  is 
equal  to  ^  nii •  Also  from  the  convexity  of  /(r*,a)  w.r.t.  a  we  know  that 

/(’•-. Ear)  <  jp  E'(r*,a?) 

m  ■  aeT  m  •  o€T 

=  I{r’,ni) 

Therefore 

/(r*,a«)  <  /(r*,ai) 

and  hence  inf  7(r*,a)  +  e  is  achieved  at  a.  too.  The  result  then  follows  from  the 
observation  that  /(r*,a)  is  concave  in  r. 
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